Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

نویسندگان

چکیده

In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion. At each round, contexts are revealed arm, and decision maker chooses one arm to pull receives corresponding reward. particular, mean-variance as risk criterion, best is with largest We apply Thompson sampling algorithm disjoint model, provide comprehensive regret analysis variant of proposed algorithm. For T rounds, K actions, d-dimensional feature vectors, prove bound $$O\left({\left({1 + \rho {1 \over \rho}} \right)d\,\ln \,T\ln {K \delta}\sqrt {dK{T^{1 2}}\ln \delta}{1 \over}}} \right)$$ that holds probability 1 − δ criterion tolerance ρ, any $$0 < \in \frac{1}{2},0 \delta 1$$ . The empirical performance our algorithms demonstrated via portfolio selection problem.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The multi-armed bandit problem with covariates

We consider a multi-armed bandit problem in a setting where each arm produces a noisy reward realization which depends on an observable random covariate. As opposed to the traditional static multi-armed bandit problem, this setting allows for dynamically changing rewards that better describe applications where side information is available. We adopt a nonparametric model where the expected rewa...

متن کامل

Multi-armed bandit problem with precedence relations

Abstract: Consider a multi-phase project management problem where the decision maker needs to deal with two issues: (a) how to allocate resources to projects within each phase, and (b) when to enter the next phase, so that the total expected reward is as large as possible. We formulate the problem as a multi-armed bandit problem with precedence relations. In Chan, Fuh and Hu (2005), a class of ...

متن کامل

Multi-armed bandit problem with known trend

We consider a variant of the multi-armed bandit model, which we call multi-armed bandit problem with known trend, where the gambler knows the shape of the reward function of each arm but not its distribution. This new problem is motivated by different on-line problems like active learning, music and interface recommendation applications, where when an arm is sampled by the model the received re...

متن کامل

Combinatorial Multi-Objective Multi-Armed Bandit Problem

In this paper, we introduce the COmbinatorial Multi-Objective Multi-Armed Bandit (COMOMAB) problem that captures the challenges of combinatorial and multi-objective online learning simultaneously. In this setting, the goal of the learner is to choose an action at each time, whose reward vector is a linear combination of the reward vectors of the arms in the action, to learn the set of super Par...

متن کامل

Multi-objective Contextual Multi-armed Bandit Problem with a Dominant Objective

In this paper, we propose a new multi-objective contextual multi-armed bandit (MAB) problem with two objectives, where one of the objectives dominates the other objective. Unlike single-objective MAB problems in which the learner obtains a random scalar reward for each arm it selects, in the proposed problem, the learner obtains a random reward vector, where each component of the reward vector ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Systems Science and Systems Engineering

سال: 2022

ISSN: ['1861-9576', '1004-3756']

DOI: https://doi.org/10.1007/s11518-022-5541-9